N-Gram Language Model Compression Using Scalar Quantization and Incremental Coding
نویسندگان
چکیده
This paper describes a novel approach of compressing large trigram language models, which uses scalar quantization to compress log probabilities and back-off coefficients, and incremental coding to compress entry pointers. Experiments show that the new approach achieves roughly 2.5 times of compression ratio compared to the well-known tree-bucket format while keeps the perplexity and accessing speed almost unchanged. The high compression ratio enables our method to be used in various SLM-based applications such as Pinyin input method and dictation on handheld devices with little available memory.
منابع مشابه
Coding Algorithm Based on Loss Compressing using Scalar Quantization Switching Technique and Logarithmic Companding
This paper proposes a novel coding algorithm based on loss compression using scalar quantization switching technique. The algorithm of switching is performed by the estimating input variance and further coding with Nonuniform Switched Scalar Compandor (NSSC). An accurate estimation of the input signal variance is needed when finding the best compressor function for a compandor implementation. I...
متن کاملQuantization techniques pdf
This paper proposes some fast and simple quantization techniques to display.The main reason for adopting different techniques in vector quantizers VQ is to design an optimal quantizer. Since the actual probability distributions of image.Abstract: Image Compression is a technique for competently coding digital. Vector Quantization VQ is a block-coding technique that quantizes blocks of data.Defi...
متن کاملColor Video Compression Based on Chrominance Vector Quantization
This paper proposes a compression technique to improve the quality of color in very low bit rate coding of video. The general idea is to convert the two chrominance components to one scalar chrominance which is processed further. The scalar representation of chrominance is obtained through vector quantization in the chrominance plane. Each (CB ; CR) vector is represented by a scalar index to a ...
متن کاملWavelet Transform Coding With Linear Prediction And The Optimal Choice Of Wavelet Basis
Wavelet transform based coding has shown to be a promising method in low bit rate data compression. By using its multiresolution characteristics and the dependencies among subbands, the important visual features can be reconstructed at high compression ratio. In this paper, we propose a new wavelet transform coding scheme which exploits the linear prediction model for the existing dependencies ...
متن کاملA Succinct N-gram Language Model
Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of N gram language models based on LOUDS, a succinct data structure. LOUDS succinctly represents a trie with M nodes as a 2M + 1 bit string. We compress it further for the N -gram language model structure. We also use ‘variable length coding’ and ‘block-wise compression’ to comp...
متن کامل